Data Structures
Author: Jose 胡冠洲 @ ShanghaiTech
Arrays
Simple Array
Definition
Performance
- Access
$k$ -th Entry:$\Theta(1)$ - Insert or Erase at
- Front:
$\Theta(n)$ $k$ -th:$O(n)$ - Back:
$\Theta(1)$
- Front:
Linked Lists
Node-based Impl.
Definition
Performance
Achieved with help of
list_tailpointer.
Array-based Impl.
Definition
list_headpoints to first index- Every cell points to next index
- Tail cell contains
NULL
stack_toppoints to first empty index- Every empty cell points to next empty index
- Last empty cell contains
NULL
Operations
- Pushing & Poping
- Insert into
stack_top/ Push empty cell into stack - Remember to modify
stack_top&list_head!
- Insert into
- Reallocation
- Enlarge:
- Remember to update
capacity&stack_top
- Remember to update
- Shrink:
- Remember to update all members
- Enlarge:
Doubly-linked List
Definition
Performance
Stacks
Last in, First out (LIFO).
- Two principal operations, both
$\Theta(1)$ pushto toppopthe top
Singly-linked List Impl.
Definition
template <typename Type> class Stack { private: Single_list<Type> list; public: bool empty() const; Type top() const; void push( Type const & ); Type pop(); };
One-ended Array Impl.
Operations
- Enlarging Schemes
$+= 1$ every time- Each push
$\Theta(n)$ time - Wasted space
$\Theta(1)$
- Each push
$*= 2$ every time- Each push
$\Theta(1)$ time - Wasted space
$\Theta(n)$
- Each push
Applications of Stacks
- XML matching
- C++ Parsing
- Function calls
- Post-fix (Reverse-Polish) notation
Queues
First in, First out (FIFO).
- Two principal operations, both
$\Theta(1)$ enqueueto bottomdequeuethe top
Singly-linked List Impl.
Definition
template <typename Type> class Queue{ private: Single_list<Type> list; public: bool empty() const; Type front() const; void enqueue( Type const & ); Type dequeue(); };
Circular Array Impl.
Definition
template <typename Type> class Queue{ private: int queue_size; int ifront; // Initially 0. int iback; // Initially -1. int array_capacity; Type *array; public: Queue( int = 10 ); ~Queue(); bool empty() const; Type front() const; void enqueue( Type const & ); Type dequeue(); };
Operations
- Enlarging Schemes
- Solution 1:
- Solution 2:
- Solution 1:
Double-ended Queue (Deque)
Definition
Trees: Introduction
Terms & Properties
- Terms
- Root, Leaf, Internal Nodes (including Root)...
- Path, Depth(length of path from root)...
- Height: maximum depth, Count # of edges, NOT nodes
- Only Root
$\Rightarrow 0$ - Empty
$\Rightarrow -1$
- Only Root
- Ancestor, Descendant (including the Node itself)
- Properties
- Recursive definition: Subtrees
- Each Node, other than Root, has exactly one node pointing to it
- No Loops,
$n$ nodes$\rightarrow$ $n-1$ edges - Detach first before Attaching
Tree Traversal
- BFS (Breadth-First Traversal): use Queue,
$\Theta(n)$
- DFS (Depth-First Traversal): use Recursion / Stack,
$\Theta(n)$ - Pre-order, mark when first visited
A, B, C, D, E, F, G, H, I, J, K, L, M
- Post-order, mark when leaving it
D, C, F, G, E, B, J, K, L, I, M, H, A
- Pre-order, mark when first visited
Forest
Definition
- Collection of Disjoint Rooted Trees
Traversal can be achieved by treating the roots as children of a Notional Root.
Trees: Binary Tree
Definition
- Notations
- Full binary tree: each node is full / leaf,
$\ne$ Complete - Complete: All left-most nodes are filled
- Perfect: All leaf at same depth; all other nodes are full
$2^{h+1}-1$ nodes- Height
$h = \lg{(n+1)}-1$
- Full binary tree: each node is full / leaf,
template <typename Type> class Binary_node { protected: Type element; Binary_node *left_tree; Binary_node *right_tree; public: Binary_node( Type const & ); Type retrieve() const; Binary_node *left() const; Binary_node *right() const; bool is_leaf() const; int size() const; };
Operations
- Traversals
- Pre-order (先根)
- In-order (中根)
- Post-order (后根)
- NOTE: How many different forms a binary tree with height
$h$ can have? A: Catalan #:$\binom{2n}{n} - \binom{2n}{n-1}$
Expression Tree
- Use Post-ordering DFS to get Reverse-Polish format
- Use In-order Traversal to get Infix format
- Need to add Brackets!!!
- 符号 prev-
$\rightarrow$ 符号, add "$($ " - 符号
$\leftarrow$ -back 符号, add "$)$ "
Complete Binary Tree
parent = k >> 1; left_child = k << 1; right_child = left_child + 1;
Left-child Right-sibling Binary Tree
- Knuth Transform: Store general tree as Binary Trees.
- Empty left sub-tree
$\Rightarrow$ no children - Empty right sub-tree
$\Rightarrow$ last in its siblings - Pre-order traversal identical
- Post-order traversal of original tree
$=$ In-order traversal of Binary
- Empty left sub-tree
For forests, consider all roots to be siblings.
Trees: Binary Heaps
First in, Highest priority out (A specific implementation of Priority Queues).
Definition
Take a Min-heap for example:
- Key of Root
$\le$ Keys of subtrees - Subtrees are also Min-heaps
- Usually, use Complete Binary Trees to ensure Balanceness
$\Rightarrow$ Space:$O(n)$
Operations
Pop Root:
- Remove root, replace with the last node
- From the new root, (percolate)
- If smaller than both children, DONE
- Else, Swap with smaller child
- Go down and Recurse
Push (Insert):
- Add to the first empty slot
- From the inserted node, (percolate)
- If bigger than parent, DONE
- Else, Swap with parent
- Go up and Recurse
Build Heap - Floyd's Method:
- for
$i = \frac{size}{2}$ to$0$ - Percolate down
array[i]
- Percolate down
Observations on this method:
- We can directly use an Array to represesnt a Heap
- For a Complete Tree,
$\lceil \frac{size}{2} \rceil$ Leaf Nodes - Only those Non-leaf Nodes need percolation
$\Rightarrow$ Time complexity is$\Theta(n)$
Heapsort
- Use Floyd's Method to build a max-heap as array
- Pop the root
- This will make an empty space near end of array
- Put the poped element there
- Repeat until finish
Huffman Coding
- Scan text, count frequencies
- Build Huffman-Tree
- Pick smallest two
- Combine
- Push back
- Traverse through Huffman-Tree to determine code
- Left gets
$0$ , Right gets$1$
- Left gets
- Go through text to encode
Trees: Binary Search Trees (BST)
Definition
- Left sub-tree (if any) is a BST and all Elements are less than the Root
- Right sub-tree (if any) is a BST and all Elements are larger than the Root
Worst case of a BST:
Operations
Find Minimum (Maximum): (
- Go to left (right) most node
Find: (
- Do Binary Search
- If empty node reached, return NULL
Insert: (
- Do
find - If found, return NULL
- Else insert at that empty node
Find Successor:
- If has a right subtree, do
find_minimumin right subtree - Else, go up toward root
- Find the first larger object on this path
Delete:
- Case 1: leaf, delete directly
- Case 2: has one child, replace by this child
- Case 3: has two children
- Find the Successor
- Replace by the successor
- Delete the successor
Find
- If
size(left_subtree)==$k$ , return current node - If
size(left_subtree)>$k$ , go to left subtree, and Recurse - Else, go to right subtree
- Recurse on finding the
$(k-$ size(left_subtree)$-1)$ -th entry
- Recurse on finding the
Trees: AVL Tree
Definition
AVL Balanced means:
$|h(l)-h(r)| \le 1$ - Both left subtree and right subtree are balanced
Must-Know Patterns
- # of nodes upper bound
$= 2^{h+1}-1$ - # of nodes lower bound
$F(h) = F(h-1) + 1 + F(h-2)$ $F(h) = Fibonacci[h+3]-1$
Operations
Insertion:
- Follow Binary Tree Convention
- Check unbalanceness, Find the LOWEST unbalanced node:
- L-L (symmetric to R-R)
$\quad \Rightarrow \quad$ - L-R (symmetric to R-L)
$\quad \Rightarrow \quad$
- L-L (symmetric to R-R)
Deletion
- Follow Binary Tree Convention
- Trace back to root, Check unbalanceness!
- Rotate to fix, then go upward
- Need to check every node on this path! (include Root)
Trees: Red-Black Tree (RBT)
Definition
- Each node Red / Black (1 bit)
- Null Path: path starting from the root where the last node is not full
- Restrictions:
- Root must be black
- Red node can only have black children
- Each null path have same # of black nodes
Must-Know Patterns
- Red node must be either full / leaf
- If node P has exactly one child Q:
- Q must be red
- Q must be leaf
- P must be black
- Worst case RB Trees
$k$ : # of black nodes per null path,$h$ : height$F(k)$ (total # of nodes):$F(k) = F(k-1)+2+2(2^{k-1}-1)$ $h_{worst} = 2 \lg{(n+2)}-3$
......
Operations
Bottom-Up Insertion:
- MUST insert a red node, o.w. The global rule c. will be violated
- Find the place where it will be inserted
- If parent is black, DONE
- Else if parent is Red
- If grandparent has only one child:
Then DONE
- If grandparent has two children, that one (uncle) must be red
Then Recurse!
- Recursion Case 1:
Then DONE
- Recursion Case 2
Then DONE
- Recursion Case 3
Then Continue Recursion!
- Recursion Case 1:
- If the root becomes Red at last, Colour it Black
- If grandparent has only one child:
Top-Down Insertion:
- From the root, every step downward:
- Check: current node is black AND there are two red children ?
- If true
$\Rightarrow$ Swap color - If the root becomes Red, Colour it Black
- If true
- Check: After Swaping, parent AND self both Red ?
- If true, do a Rotation!
- Check: current node is black AND there are two red children ?
- When already reaches bottom, only needs at most one more rotation
Deletion:
When deleting a node:
- If Node is Leaf, delete it
- If Node is internal, replace it with its Successor, then actually deletes the Successor
Therefore patterns can ONLY be:
- Deleting a Red leaf, DONE
- Deleting Black node with a Red leaf, Replace value
Then DONE
- Deleting a Black leaf! (不是最简但最直观)
- Sibling Red:
- Sibling Black & Parent Red:
- Sibling Black & Parent Black:
- Now the whole subtree (rooted at P) lacks a black node, Recurse!
- Sibling Black & has left child:
- Sibling Black & has only right child:
- Sibling Red:
Hash Tables
Scenary: Error code vary in range 0~65536, but in total 150 different error conditions
- Solution 1 - Array of length 150: Slow to locate an error code (Binary search)
- Solution 2 - Array of length 65536: Large memory space wasted
- Solution 3 - Hash Table
Mapping objects onto numbers
- Predetermined. e.g. Student ID #
- May make two equivalent objects having different hash values
- Arithmetic, e.g. a determinstic function
- For rational #s: define
$\frac{p}{q} \Rightarrow p + q$ - Problem 1 -
$\frac{1}{2}$ and$\frac{2}{4}$ hashes differently: Divide by$gcd$ - Problem 2 -
$\frac{1}{2}$ and$\frac{-1}{-2}$ hashes differently: Use$abs$ form
- Problem 1 -
- For strings: Let individual characters represent coefficients of polynomial of
$x$
- For rational #s: define
Hash Functions
The process of mapping a number onto an integer index in a given range.
- Requirements
- Must be in
$O(1)$ time - Output is determinstic: Always same output for same input
- Could have collision situations
- Must be in
- Types
- Modulus:
$H(n) = n \% M =$ n & ((1 << m) - 1),$M = 2^m$ - Fast, just take last
$m$ bits
- Fast, just take last
- Multiplicative:
$H(n) = \text{certain } m \text{ bits in } n * C =$ ((C * n) >> shift) & ((1 << m) - 1) - ...
- Modulus:
Collisions Dealing: Chained List
Use linked lists to store collisions.
- Operations
push_frontto list head every time- Search should now go through the list (
$O(\lambda)$ )
- Load Factor
$\lambda = \frac{n}{M}$ represents the length of linked lists- If
$\lambda$ goes high, re-hash (Double the table size and re-insert all elements) - Choose hash functions that avoid clustering!
- If
- Could replace each linked list with an AVL tree
Collisions Dealing: Open Addressing - Linear Probing
Operations
Insert:
- If bin
$k$ empty, occupy it - Otherwise, go to bin
$k + 1$ ,$k + 2$ ... until an empty bin is found (Circular array!)
Search:
- Start from bin
$k$ , search forward until- item found,
true - empty bin found,
false - traversed entire array,
false
- item found,
Erase:
- Erase slot
$k$ , making a hole at bin$k$ (hole = k) - Repeat:
- Go to next adjacent bin
$k'$ (occupied by element$n$ ) - If
$H(n)$ is at OR beforeholebut after$k'$ - Move
$n$ intohole, resulting in a new hole (hole = k')
- Move
- Go to next adjacent bin
- Until next bin is empty
Lazy Erasing:
- Erase slot
$k$ , mark it asERASED - When searching meets
ERASED, regard it as occupied - When insertion meets
ERASED, regard it as un-occupied- Need searching before insertion to avoid duplicate elements
- When calculating
$\lambda$ , regardERASEDas occupied
Analysis
- Primary Clustering: Probability of increasing length of a length-
$l$ cluster$= \frac{l+2}{M}$ - # of probes for a successful search:
$\frac{1}{2} (1 + \frac{1}{1 - \lambda})$ - # of probes for a un-successful search:
$\frac{1}{2} (1 + \frac{1}{(1 - \lambda)^2})$ - Keep
$\lambda$ under a certain bound
Collisions Dealing: Open Addressing - Quadratic Probing
Move forward by different amounts every time.
- Using
$M = 2^m$ , step forward$1$ at first probing, then$2$ ,$3$ ... - Must use Lazy Erasing
- Analysis
- Secondary Clustering: Object hashing to same bin follows same sequence
- # of probes for a successful search:
$\frac{\ln{\frac{1}{1 - \lambda}}}{\lambda}$ - # of probes for a un-successful search:
$\frac{1}{1 - \lambda}$
Disjoint Sets (Union-Find Set)
- Partition elements according to equivalence relations
- Use a representative to represent all elements in set
find(a)operation returns the representative of set thatais inunion(a, b)operation unions two sets containinga, b
Array-based impl.
Definition
Performance
findtakes$\Theta(1)$ uniontakes$\Theta(n)$
Tree-based impl.
Definition
size_t find( size_t i ) const { if (parent[i] != i) parent[i] = find(parent[i]); return parent[i]; } void union(size_t i, size_t j) { i = find(i); j = find(j); if ( i != j ) parent[j] = i; }
Performance
findtakes$\Theta(h)$ - Apply path compression
-
uniontakes$\Theta(h)$ - Point root of shorter tree to taller one
-
Worst case: Pascal's Triangle
- Depth is
$\frac{\ln{n}}{2} = O(\ln{n})$ without path compression - Depth is
$O(\alpha(n))$ with path compression
- Depth is
Application of Disjoint Sets: Maze Generation
- Divide the maze into square cells, each surrounded by four walls
- Make every cell a disjoint set
- Repeat:
- Randomly choose a wall
- If it connects two disjoint sets of cells, remove it, union two sets
- Until all cells in one set
Graphs: Introduction
Categories
Undirected Graph
- Vertex & Edge
- Assume {
$v_1, v_1$ } (self-loop) is not an edge $|E|_{max} = \frac{|V| (|V| - 1)}{2} = O(|V|^2)$
- Assume {
- Neighbours: adjacent vertices
- Degree of a vertex: # of neighbours
- Subgraph
$G_s = (V_s, E_s)$ where$V_s$ is subset of$V$ ,$E_s$ is subset of$E$ - Vertex-included sub-graph
$G' = (V, E_s)$
- Vertex-included sub-graph
- Path from
$v_0$ to$v_k$ - Length of a path is # of edges it passes through
- Simple Path has no repetition vertices (except maybe
$v_0 = v_k$ ) - Simple Cycle is a simple path with
$v_0 = v_k$
- Connected: exists a path between
- A Connected Graph has a path between any pair of vertices
- A Connected Component is a maximum connected subgraph
- Weight of an edge might be assigned
- Length of a weighted path is Sum of edge weights it passes through
- A Tree is:
- Connected & There is a unique path between any two vertices
- Have exactly
$n - 1$ edges for$n$ nodes - Acyclic
- Removing any edge creates two unconnected sub-graphs
- A Forest is any graph that is Acyclic
- # of trees in forest
$= |V| - |E|$
- # of trees in forest
Directed Graph
$|E|_{max} = |V| (|V| - 1) = O(|V|^2)$ - In Degree of a vertex: # of inward edges
- In degree
$= 0$ - Source
- In degree
- Out Degree of a vertex: # of outward edges
- Out degree
$= 0$ - Sink
- Out degree
- Connected: exists a path between
- Strongly Connected: there exists a directed path from and to any two vertices
- Weakly Connected: view it as undirected and then connected
- Directed Acyclic Graphs (DAG)
Representations
Adjacency Matrix
- Symmetric for undirected graphs
$M[a,a] = 0$ ,$M[a,b] = \infty$ if$a, b$ are not connected- Suitable for Condense Graphs
Adjacency List
- For undirected graphs, consider every edge to be doubly directed
- Suitable for Sparse Graphs
Graphs: Traversal (Searching)
Breadth-First Search
Uses a Queue,
- Choose a vertex, mark as
VISITED, push into an empty queue - Repeat:
- Pop queue head
$v$ - For each neighbour of
$v$ :$u$ that isNOT VISITED:- Mark
$u$ asVISITED - Set
$u$ 's parent to be$v$ - Push
$u$ into queue
- Mark
- Pop queue head
- Until queue is empty
- If all vertices visited now, then graph is Connected
- Parent pointers form a BFS Tree
Depth-First Search
Uses a Stack,
- Choose a vertex, mark as
VISITED, push into an empty stack - Repeat:
- If vertex at stack top
$v$ has anNOT VISITEDneighbour$u$ :- Mark
$u$ asVISITED - Set
$u$ 's parent to be$v$ - Push
$u$ into stack
- Mark
- Otherwise, pop
$v$
- If vertex at stack top
- Until stack is empty
- If all vertices visited now, then graph is Connected
- Parent pointers form a DFS Tree
Applications of BFS & DFS:
- Finding Connected Components
- Determine Distances from source
- Use BFS,
$s.d = 0$ - At every parent setting,
$u.d = u.parent.d + 1$ - Test Bipartiteness
- Use BFS, alternately set every layer
- Find Strongly-Connected Components - Kosaraju's SCC Algorithm:
- DFS(
$G$ ), record Discovery-time and Finish-time- Reverse all edges in
$G$ , get$G^T$ - DFS(
$G^T$ ), pick nodes in decreasing order of Finish-time- Each DFS Tree formed in second DFS is an SCC!
Topological Sort
An ordering of vertices in DAG s.t.
- Examples
- Taking courses in SIST
- Wearing clothes
- Having Topological Sort
$\Leftrightarrow$ DAG
In-degree Based Procedure
- Use an array to store in-degrees of all vertices
- Find a vertex
$v$ with in-degree$0$ - Repeat:
- Remove
$v$ , update vertices in-degrees $v \leftarrow$ a vertex with in-degree$0$ - If none found, not a DAG
- Remove
- Until all vertices picked
Performance
- Space:
$\Theta(|V|)$ - Time:
- Search through array for in-degree
$0$ :$O(|V|^2)$ - Every time a vertex's in-degree updated
$0$ , push into queue$\Theta(|V| + |E|)$ - Queue initially contains all in-degree
$0$ vertices - The array used for queue stores exactly the ordering!
- Search through array for in-degree
Finding Cirtical Paths
Every node (task) has a computing time.
- Critical Time: Minimum time of completing all tasks in parallel
- Critical time of a task is earliest time that it can be finished
- Critical Path: Sequence that determines the minimum time
- Find a vertex
$v$ with in-degree$0$ - Repeat:
- Remove
$v$ , update vertices in-degrees v.cirtical_time += v.task_time- For every neighbour
$u$ of$v$ :- If
v.cirtical_time$>$ u.critical_time:u.critical_time$\leftarrow$ v.critical_time- Set
u.previous$\leftarrow v$
- If
$v \leftarrow$ a vertex with in-degree$0$
- Remove
- Until all vertices picked
Graphs: Minimum Spanning Tree (MST)
- Spanning Tree: a subgraph that is a tree, and includes all vertices
- Minimum Spanning Tree is the one with minimum weights
- Might have Spanning Forest for un-connected graphs
- Cut Property (MST Property)
Prim's Algorithm
Procedure
$Vset$ is initially root node$\{s\}$ ,$Eset$ is initially$\emptyset$ $R$ is initially$\{(s, u): \text{for every neighbour } u\}$ - Repeat:
- Extract the edge
$(v, u)$ with Minimum weight in$R$ , where$u$ is the one$\notin Vset$ - Add
$u$ into$Vset$ - Add
$e$ into$Eset$ - For every edge
$(u, w)$ starting from$u$ - If
$w \notin Vset$ , add$(u, w)$ into$R$
- If
- Extract the edge
- Until
$Vset == V$
Performance
- Space:
$O(|V|)$ - Time:
$O(|E|\ln{|V|})$ - With Fibonacci Heap:
$O(|E| + |V|\ln{|V|})$
- With Fibonacci Heap:
Kruskal's Algorithm
Procedure
$Eset$ is initially$\emptyset$ - Make all vertices a disjoint set
- Sort
$E$ in non-decreasing order of weights - For every edge
$(u, v)$ in$E$ in order:- If
$u$ and$v$ are not in same set:- Add
$(u, v)$ into$Eset$ - Union the two sets
- Add
- If
Performance
- Space:
$O(|E|)$ - Time:
$O(|E|\ln{|V|})$
Graphs: Shortest Path
Prerequisites: Weighted Graphs & No Negative-weighted Loops.
Dijkstra's Algorithm
Single Source Shortest Path (SSSP), similar to BFS.
Prerequisites
- No Negative-weighted Edges
- Based on Triangle Inequality
$\delta(s, u) \le \delta(s, v) + w(v, u)$
Procedure
- Initialize source node
$s.d = 0$ - Initialize all other nodes
$v.d = \infty$ $Vset$ is initially$V$ - Repeat:
- Extract vertex
$v$ with minimum$d$ from$Vset$ - Mark
$v$ asVISITED - For every neighbour
$u$ of$v$ that isNOT VISITED:- If
$u.d > v.d + w(u, v)$ :$u.d \leftarrow v.d + w(u, v)$ - Set
$u$ 's parent to$v$
- If
- Extract vertex
- Until
$Vset$ is empty
Performance
- Time:
$O(|E|\ln{|V|})$ - Worst case -
Special Cases
- Cannot apply to Negative-weighted Edges:
- Output is not MST:
Bellman-Ford Algorithm
Single Source Shortest Path (SSSP), can detect Negative Loops.
Procedure
- Initialize source node
$s.d = 0$ - Initialize all other nodes
$v.d = \infty$ - Repeat
$|V| - 1$ times:- For every directed edge
$(v, u) \in E$ :- If
$u.d > v.d + w(u, v)$ :$u.d \leftarrow v.d + w(u, v)$ - Set
$u$ 's parent to$v$
- If
- For every directed edge
- If there is an edge
$(v, u)$ where$u.d > v.d + w(u, v)$ :- The graph has negative loop
Performance
- Time:
$O(|V||E|)$
Floyd-Warshall Algorithm
All Pairs Shortest Paths (APSP).
Procedure
- Let
$D^{(0)}$ be the Adjacency Matrix - Let
$Next$ be an Matrix with$Next_{ij} = j$ if there's an edge$(i, j)$ - For
$k$ from$1$ to$n$ :- For
$i$ from$1$ to$n$ :- For
$j$ from$1$ to$n$ :- If
$D^{(k-1)}_{ij} > D^{(k-1)}_{ik} + D^{(k-1)}_{kj}$ :$Next_{ij} = Next_{ik}$ $D^{k}_{ij} = D^{(k-1)}_{ik} + D^{(k-1)}_{kj}$
- Else
$D^{k}_{ij} = D^{(k-1)}_{ij}$
- If
- For
- For
$D^{(|V|)}$ is the all pairs shortest paths matrix
Go through
$Next$ Matrix to acquire the shortest path.
Performance
- Time:
$O(|V|^3)$
Modify for finding Connectiveness:
$T^{(k)}_{ij} = T^{(k-1)}_{ij}\ ||\ (T^{(k-1)}_{ik}\ \&\&\ T^{(k-1)}_{kj})$ .
$A^*$ Search
Source-to-Destination Shortest Path.
Prerequisites
Heuristic Function
$G(x) =$ The minimum cost of moving from known paths to$x$ , i.e.$d$ in Dijkstra$H(x) =$ Heuristic Function: Estimated cost of moving from$x$ to destination- Admissible Heuristics: Ensuring
$H(x) \le \delta(x, dest)$ - Only when using Admissible Heuristics can ensure finding optimal Shortest Path
- Admissible Heuristics: Ensuring
$F(x) = G(x) + H(x)$
Procedure
- Initialize source node to have
$G$ -score of$0$ - Initialize all other nodes to have
$G$ -score of$\infty$ $Vset$ is initially$V$ - Repeat:
- Extract node
$v$ with smallest$F$ -score in$Vset$ - If
$v$ is the destination:- Path found, END the procedure
- Mark
$v$ asVISITED - For every neighbour
$u$ of$v$ that isNOT VISITED- If
$G(u) > G(v) + d(v, u)$ :$G(u) \leftarrow G(v) + d(v, u)$ - Set
$u.parent$ to be$v$
- If
- Extract node
- Until
$Vset$ is empty - Assert no path exists
Performance
- Time: Depends a lot on
$H(x)$ - Degrades to Dijkstra's Alg when
$H(x) = C$ for all nodes other than destination (Discrete Distance)
Sorting
Taking a list of objects which could be stored in a linear order, returning a reordering s.t. they are in order.
Notations
- In-place / not
- In-place: with the allocation of
$O(1)$ memory,$\checkmark$ - Not In-place: requires at least
$\Theta(n)$ memory
- In-place: with the allocation of
- Run-time
- Worst-case run time: Based on comparisons, CANNOT be faster than
$\Theta(n\ln{n})$ - Average-case run time: Expected
- Lower-bound (Best-case) run time: Must examine each entry at least once, so
$\Omega(n)$
- Worst-case run time: Based on comparisons, CANNOT be faster than
- Comparison Tree
- Any comparison-based sorting can be represented by a comparison tree
- For any array instance, the Sorting procedure is passing through a path from root to a certain leaf
- # of leaves
$= n!$ - Therefore height
$= \ln{n!} = \Theta(n\ln{n})$
- Any comparison-based sorting can be represented by a comparison tree
- 5 Sorting Techniques
- Inversions: # if pairs that is not in order,
$j < k$ but$a_j > a_k$ - # of inversions
$=$ - Expectedly, half of all pairs are inversions:
$\frac{1}{2} \frac{n(n-1)}{2} = \frac{n(n-1)}{4} = O(n^2)$ - Denoted as
$d$
- Expectedly, half of all pairs are inversions:
- # of inversions
Stupid Sorting Algorithms
- Bogo Sort: Randomly order the objects, check if sorted. If not, repeat.
- In average
$\Theta(n \cdot n!)$
- In average
- Bozo Sort: Check if sorted. If not, randomly swap two entries, and repeat.
- In average
$\Theta(n!)$
- In average
Insertion Sort
Given a sorted list of length
Procedure
void insertionSort(Type *array, int const n) { for (int k = 1; k < n; ++k) { for (int j = k; j > 0; --j) { if (array[j - 1] > array[j]) swap(array[j - 1], array[j]); else break; } } }
With every
swap, removes an inversion.
Performance
- Space: In-place
- Time
- Worst-case:
$O(n^2)$ - Average-case:
$\Theta(n + d)$
- Worst-case:
Bubble Sort
"Bubble up" the remaining smallest entry every time.
Procedure
void bubbleSort(Type *array, int const n) { for (int i = 0; i < n - 1; ++i) { for (int j = n - 1; j > i; --j ) { if (array[j] < array[j - 1]) swap(array[j], array[j - 1]); } } }
Performance
- Space: In-place
- Time: In all cases
$\Theta(n^2)$
Possible improvements
- Store the remaining smallest, avoid so many swaps
- Use
boolflag to check if no swaps occurred, then already sorted - Alternate between "Bubbling" and "Sinking"
Heap Sort
Build a max heap on array
Procedure
void heapSort(Type *array, int const n) { max_heap = buildMaxHeapFloyd(array); for (int i = n - 1; i > 0; --i) array[i] = extractRoot(max_heap); }
Performance
- Space
- Originally:
$\Theta(n)$ for the heap - In-place implementation:
$\Theta(1)$
- Originally:
- Time
- Worst-case / Averagely:
$O(n\ln{n})$ - Best (Most entries are same):
$\Theta(n)$
- Worst-case / Averagely:
Merge Sort
Divide-and-Conquer: Recursively Divide
Procedure
MergeOperation:$\quad + \quad$ - Analysis
- Space:
$\Theta(n)$ - Time:
$\Theta(n_1 + n_2) = \Theta(n)$
- Space:
- Analysis
void mergeSort(Type *array, int first, int last) { if (last - first <= N) { insertionSort(array, first, last); } else { int midpoint = (first + last) / 2; mergeSort(array, first, midpoint); mergeSort(array, midpoint, last); merge(array, first, midpoint, last); } }
Use
insertionSortfor small subarrays (size <= N).
Performance
- Space:
$\Theta(n)$ for extra array - Time: In all cases
$\Theta(n\ln{n})$ - Recursion Tree
Quick Sort
Divide-and-Conquer: Recursively Partition.
Procedure
PartitionOperation:- Analysis
- Space: In-place
- Time:
$\Theta(n)$
- Analysis
void quicksort(Type *array, int first, int last) { if (last - first <= N) insertion_sort( array, first, last ); else { Type pivot = array[last] int i = first - 1; for (int j = first; j < last; ++j) { if (array[j] < pivot) { ++i; swap(array[i], array[j]); } } swap(array[i+1], array[last]); quickSort(array, first, i); quickSort(array, i+2, last); } }
Use
insertionSortfor small subarrays (size <= N).
Performance
- Space: In-place, BUT
- Worst-case:
$\Theta(n)$ for function call stacks - Average-case:
$\Theta(\ln{n})$ for function call stacks
- Worst-case:
- Time
- Worst-case (
pivotis always extreme):$O(n^2)$ - Average-case (
pivotis well chosen every time):$\Theta(n\ln{n})$ - Use Median-of-three: Choose
pivotas median of first, middle and last element
- Use Median-of-three: Choose
- Worst-case (
Possible improvements
Modify the Partition procedure to be:
- Choose
pivotusing Median-of-three- If
firstis pivot, swap withmiddle - If
lastis pivot, swap withmiddle
- If
- Use two pointers,
lowat$1$ ,highat$n-2$ - Repeat:
- Increment
lowUntilarray[low] > pivot - Decrement
highUntilarray[high] < pivot - Swap
array[low]witharray[high]
- Increment
- Until
low > high - Put
pivotatlow, Putarray[low]atn-1
Bucket Sort
Prerequisites
- Numbers must be in certain range
$O(m)$ !! - Not based on comparisons.
Procedure
Throw numbers into
Counting Sort: count how many times an "1" occurs...
Performance
- Space:
$\Theta(m)$ - Time:
$\Theta(n + m)$
Radix Sort
Prerequisites
- Numbers must be digit numbers on certain base
$b$ (Not necessarily 10) - Numbers must be finitely long, i.e. in certain range
$O(m)$ !! - Not based on comparisons.
Procedure
For certain digit numbers, apply Bucket Sort on the last digit, then..., finally the first digit.
Use Queues for a Bucket.
Performance
- Space:
$\Theta(n)$ (# of buckets$= b$ ) - Time: In all cases
$\Theta(n\log_{b}{m})$